Clustering the Labeled and Unlabeled Datasets using New MST based Divide and Conquer Technique
نویسنده
چکیده
Clustering is the process of partitioning the data set into subsets called clusters, so that the data in each subset share some properties in common. Clustering is an important tool to explore the hidden structures of modern large Databases. Because of the huge variety of the problems and data distributions, different classical clustering algorithms, such as hierarchical, partitional, density-based and model-based clustering approaches, have been developed and no techniques are completely satisfactory for all the cases. Sufficient empirical evidences have shown that a New Minimum Spanning Tree (NMST) representation is quite invariant to the detailed geometric changes in cluster boundaries. Therefore, the shape of a cluster has little impact on the performance of MST based clustering algorithms, which allows us to overcome many of the problems faced by the classical clustering algorithms. NMST based clustering algorithms also have the ability to detect clusters with irregular boundaries and so they are being widely used in practice. In these MST based clustering algorithms, search for nearest neighbour is to be done in the construction of NMST. This search is the main source of computation and the standard solutions take O(N) time. In our paper, we present a fast minimum spanning tree-inspired clustering algorithm. This algorithm uses an efficient implementation of the cut and the cycle property of the NMST, that can have much better performance than O(N) time. General Terms: Data Mining, Image Processing, Artificial Intelligence, Graph Theory and Design Analysis of Algorithms.
منابع مشابه
Clustering in WSN Based on Minimum Spanning Tree Using Divide and Conquer Approach
Due to heavy energy constraints in WSNs clustering is an efficient way to manage the energy in sensors. There are many methods already proposed in the area of clustering and research is still going on to make clustering more energy efficient. In our paper we are proposing a minimum spanning tree based clustering using divide and conquer approach. The MST based clustering was first proposed in 1...
متن کاملA Novel K means Clustering Algorithm for Large Datasets Based on Divide and Conquer Technique
In this paper we propose an efficient algorithm that is based on divide and conquers technique for clustering the large datasets. In our research work we have applied divide and conquer technique on partitions of the large datasets and we have used squared Euclidean distance for measuring the similarity between data points. The partitioning of datasets is done according to the number of cluster...
متن کاملFree Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملSemi-supervised Text Categorization Using Recursive K-means Clustering
In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy. It uses recursive K-means algorithm for partitioning both labeled and unlabeled data collection. The K-means algorithm is applied recursively on each partiti...
متن کاملA Divide and Conquer Framework for Distributed Graph Clustering
Graph clustering is about identifying clusters of closely connected nodes, and is a fundamental technique of data analysis with many applications including community detection, VLSI network partitioning, collaborative filtering, etc. In order to improve the scalability of existing graph clustering algorithms, we propose a novel divide and conquer framework for graph clustering, and establish th...
متن کامل